Module 01

Reserve the first level headings (#) for the start of a new Module. This will help to organize your portfolio in an intuitive fashion.
Note: Please edit this template to your heart’s content. This is meant to be the armature upon which you build your individual portfolio. You do not need to keep this instructive text in your final portfolio, although you do need to keep module and assignment names so we can identify what is what.

Module 01 Portfolio Content

  • Evidence worksheet_01
    • Completion status: X
    • Comments:
  • Evidence worksheet_02
    • Completion status: X
    • Comments:
  • Evidence worksheet_03
    • Completion status: X
    • Comments:
  • Problem Set_01
    • Completion status: X
    • Comments:
  • Problem Set_02
    • Completion status: X
    • Comments:
  • Writing assessment_01
    • Completion status:
    • Comments:
  • Additional Readings
    • Completion status:
    • Comments: Need links.

Data science Friday

Data Science

  • Installation check
    • Completion status: X
    • Comments:
  • Portfolio repo setup
    • Completion status: X
    • Comments:
  • RMarkdown Pretty PDF Challenge
    • Completion status: X
    • Comments:
  • ggplot
    • Completion status: 9/10
    • Comments:
    • Exercise 3 - check your plot title. Are you showing phyla?
    • Exercise 4 - explore the ‘scales’ parameter within facet to allow the y axes to change scale in the separate facets. The remaining second level headers (##) are for separating data science Friday, regular course, and project content. In this module, you will only need to include data science Friday and regular course content; projects will come later in the course.

Installation check

Third level headers (###) should be used for links to assignments, evidence worksheets, problem sets, and readings, as seen here.

Use this space to include your installation screenshots.

Portfolio repo setup

Detail the code you used to create, initialize, and push your portfolio repo to GitHub. This will be helpful as you will need to repeat many of these steps to update your porfolio throughout the course.

In Git: mkdir MICB425_portfolio

cd MICB425_portfolio

cd MICB425_portfolio

Create repository on GitHub page.

git init

git add .

git commit -m “First commit”

git remote add origin https://remote_repository_URL

git remote -v

git push -u origin master

RMarkdown pretty html challenge

Paste your code from the in-class activity of recreating the example html.

R Markdown PDF Challenge

The following assignment is an exercise for the reproduction of this .html document using the RStudio and RMarkdown tools we’ve shown you in class. Hopefully by the end of this, you won’t feel at all the way this poor PhD student does. We’re here to help, and when it comes to R, the internet is a really valuable resource. This open-source program has all kinds of tutorials online.

http://phdcomics.com/ Comic posted 1-17-2018

http://phdcomics.com/ Comic posted 1-17-2018

Challenge Goals

The goal of this R Markdown html challenge is to give you an opportunity to play with a bunch of different RMarkdown formatting. Consider it a chance to flex your RMarkdown muscles. Your goal is to write your own RMarkdown that rebuilds this html document as close to the original as possible. So, yes, this means you get to copy my irreverant tone exactly in your own Markdowns. It’s a little window into my psyche. Enjoy =)
hint: go to the PhD Comics website to see if you can find the image above
If you can’t find the exact image, just find a comparable from the PhD Comics website and include it in your markdown

Here’s a header!

Let’s be honest, this header is a little arbitrary. But show me that you can reproduce headers with different levels please. This is a level 3 header, for your reference (you can most easily tell this from the table of contents).

Another header, now with maths

Perhaps you’re already really confused by the whole markdown thing. Maybe you’re so confused that you’ve forgotton how to add. Never fear!A calculator R is here:

1231521+12341556280987
## [1] 1.234156e+13

Table Time

Or maybe, after you’ve added those numbers, you feel like it’s about time for a table! I’m going to leave all the guts of the coding here so you can see how libraries (R packages) are loaded into R (more on that later). It’s not terribly pretty, but it hints at how R works and how you will use it in the future. The summary function used below is a nice data exploration function that you may use in thefuture.

library(knitr)
kable(summary(cars),caption="I made this table with kable in the knitr package library")
I made this table with kable in the knitr package library
speed dist
Min. : 4.0 Min. : 2.00
1st Qu.:12.0 1st Qu.: 26.00
Median :15.0 Median : 36.00
Mean :15.4 Mean : 42.98
3rd Qu.:19.0 3rd Qu.: 56.00
Max. :25.0 Max. :120.00

And now you’ve almost finished your first RMarkdown! Feeling excited? We are! In fact, we’re so excited that maybe we need a big finale eh?
Here’s ours! Include a fun gif of your choice!

Silicon Valley

Silicon Valley

Origins and Earth Systems

Evidence worksheet 01

The template for the first Evidence Worksheet has been included here. The first thing for any assignment should link(s) to any relevant literature (which should be included as full citations in a module references section below).

You can copy-paste in the answers you recorded when working through the evidence worksheet into this portfolio template.

As you include Evidence worksheets and Problem sets in the future, ensure that you delineate Questions/Learning Objectives/etc. by using headers that are 4th level and greater. This will still create header markings when you render (knit) the document, but will exclude these levels from the Table of Contents. That’s a good thing. You don’t’ want to clutter the Table of Contents too much.

Whitman et al 1998

Learning objectives

Describe the numerical abundance of microbial life in relation to ecology and biogeochemistry of Earth systems.

General questions

  • What were the main questions being asked?
    What is the abundance of prokaryotes on earth? What is the total amount of cellular carbon produced by these prokaryotes on earth?

  • What were the primary methodological approaches used?
    To count prokaryotes
    • aquatic environments: used cellular density
    • soil: direct counts from a coniferous forest ultisol (cells/g)
      • unpublished field studies of E. A. Paul for cultivated soils
      • terrestrial subsurface
      • unconsolidated sediments represent most of marine subsurface and have been determined
      • assuming that average porosity of terrestrial subsurface is 3%
      • estimation from groundwater data based on values from seven sites and four studies

Other Habitats: - animals - human: cell density of prokaryotes on the skin multiply by skin surface area - insects like termite by counting number of insect and number of prokaryotes in said insect - leaves: can be estimated by assuming a dense population and high leaf area index
- air: pre-calculated Carbon Content: - estimated from cell numbers in soil, aquatic systems, and the subsurface - cellular carbon is assumed to be one-half of dry weight for soil and subsurface - take average dry weight of prokaryotic cells multiple by number of cell - aquatic systems: assumed that average cellular carbon for sedimentary and planktonic prokaryotes to be 10 and 20 fg of C/cell respectively then multiple that with number of cells in aquatic systems

  • Summarize the main results or findings.
    • Total number of prokaryotes is 4-6 x 1030 cells and 350-500 Pg of C (1 Pg = 1015 g)
    • Represent the largest pool of nutrients such as N and P Essentially, prokaryotic biomass as a major contributor to total biosphere
  • Do new questions arise from the results?
    • what is the genetic diversity of these prokaryotes?
    • The number of prokaryotic species? How does prokaryotic turnover affect carbon fixation and carbon cycle?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    • bombards you with numbers
    • inadequate explanation of some assumptions made especially when estimating carbon content
    • can be found in literature however
    • assumed that the papers they cited had the proper methods
    • lots of estimation of cell densities

Evidence Worksheet_02 Life and the Evolution of Earth’s Atmosphere

Learning objectives:

Comment on the emergence of microbial life and the evolution of Earth systems

  • Indicate the key events in the evolution of Earth systems at each approximate moment in the time series. If times need to be adjusted or added to the timeline to fully account for the development of Earth systems, please do so.

    • 4.6 billion years ago
      Formation of Earth

    • 4.5 billion years ago
      Moon was formed to give Earth spin & tilt, day & night cycles, seasons

    • 4.4 billion years ago
      oldest mineral found (zircon)

    • 4.1 billion years ago
      earliest evidence of life in zircon

    • 3.8 billion years ago
      meteor bombardment stops
      Sedimentary rocks: weathering, ocean
      carbon isotopes also in graphite
      iron rich sedimentary rocks

    • 3.5 billion years ago Photosynthesis: ambigious microfossils
      stromatolites (organosedimentary structures produced by microbial trappings, usually but not always photosynthetic)

    • 3.0 billion years ago Glaciation: Earth would have appeared brown

    • 2.2 billion years ago oxygen levels increased sharply
      rock recognized as redbeds -> evidence for oxidation

    • 2.1 billion years ago end of Snowball Earth

    • 1.9 billion years ago Eukaryote emergence

    • 550 million years ago Cambrian explosion

    • 400 million years ago emergence of land plants

    • 200,000 years ago
      H. Sapiens appear

  • Describe the dominant physical and chemical characteristics of Earth systems at the following waypoints:

    • Hadean extremely hot >100oC ocean temperature
      seawater chemistry controlled by volcanism

    • Archean
      methanogenesis (early); Greenhouse effect because of CH4 and CO2

    • Precambrian
      reducing atmosphere
      glaciation ended as greenhouse effec was enhanced by volcanoes
      CO2 levels hundrends times higher than now

    • Proterozoic
      Snowball Earth
      accumulation of oxygen in the Earth’s atmosphere
      filling of chemical sinks and increase carbon burtial
      nitrogen concentration close to modern levels

    • Phanerozoic carboniferous period
      four separate glaciation periods
      higher oxygen levels

Evidence worksheet 03

Rockstrom et al 2009

Learning objectives

Evaluate human impacts on the ecology and biogechemistry of Earth systems.

General questions

  • What were the main questions being asked?
  • How to define preconditions for human development?
  • What are the consequences of crossing certain biophysical thresholds? - Three of nine interlinked planetary boundaries have already been overstepped?

  • What were the primary methodological approaches used?
  • thresholds can be defined by a critical value for one or more control variables (eg. [CO2])

  • Summarize the main results or findings.
  • Nine processes that is necessary to define planetary boundaries
    1. Climate change
    2. Rate of biodiversity loss
    3. Nitrogen and Phosphorus Cycle interference
    4. Stratospheric Ozone depletion
    5. Ocean acidification
    6. Global fresh-water use
    7. Change in land use
    8. Chemical pollution
    9. Atmospheric aerosol loading
  • Three were transgressed: 1, 2, 3
  • Human changes to atmospheric CO2 concentrations should not exceed 350 parts per million by volume
  • Extinction rate as an alternative indicator: Planetary boundary for biodiversity of ten times the background rates.
  • Nitrogen valve should contain the flow of new reactive nitrogen to 25% of its current value.
  • No more than 11 million tonnes of phosphorus per year should be dumped ino the ocean.

  • Do new questions arise from the results?
  • how do humans go about achieiving these boundaries?
  • are these goals realistic?
  • what are some preliminary measures we can consider to reach these goals?

  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?

Problem set 01

Learning objectives:

Describe the numerical abundance of microbial life in relation to the ecology and biogeochemistry of Earth systems.

Specific questions:

  • What are the primary prokaryotic habitats on Earth and how do they vary with respect to their capacity to support life? Provide a breakdown of total cell abundance for each primary habitat from the tables provided in the text.

    a. Aquatic : 1.18 x 10^29^
    b. Soil: 2.556 x 10^29^
    1. subsurface: 3.8 x 1030
  • What is the estimated prokaryotic cell abundance in the upper 200 m of the ocean and what fraction of this biomass is represented by marine cyanobacterium including Prochlorococcus? What is the significance of this ratio with respect to carbon cycling in the ocean and the atmospheric composition of the Earth? 3.6 x 1028 cyanobacteria: 4x 104 cells/ml / 5 x 105 cells x 100 = 8%

  • What is the difference between an autotroph, heterotroph, and a lithotroph based on information provided in the text?
    a. autotroph: “self-nourishing” fix inorganic carbon (CO2) -> biomass b. heterotroph: assimilate organic carbon
    1. lithotroph: use inorganic substances
  • Based on information provided in the text and your knowledge of geography what is the deepest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this depth?

Since the temperature drop is 22 degrees drop per km so the deepest part that can support life is Mariana Trench 10.9km + plus an extra 5 km

  • Based on information provided in the text your knowledge of geography what is the highest habitat capable of supporting prokaryotic life? What is the primary limiting factor at this height?
    22 km on top of the 8.8 km on Mt. Everest. A limiting factor at that height would be obtaining enough nutrients.

  • Based on estimates of prokaryotic habitat limitation, what is the vertical distance of the Earth’s biosphere measured in km?

-22 + 8.8 + 10.9 + 5 = 46.7 km

  • How was annual cellular production of prokaryotes described in Table 7 column four determined? (Provide an example of the calculation)
    - 3.6 x 1028 / 16 x 365 = 8.4 x 1029 - population size divided by turnover time per day times 365 days

  • What is the relationship between carbon content, carbon assimilation efficiency and turnover rates in the upper 200m of the ocean? Why does this vary with depth in the ocean and between terrestrial and marine habitats?
    • carbon efficiency is 20%
    • 5-20 fg C/cell
    • (3.6 x 1026 Pg/cell)(20x1030 cell) = 0.72 Pg of C in marine heterotrophs
  • How were the frequency numbers for four simultaneous mutations in shared genes determined for marine heterotrophs and marine autotrophs given an average mutation rate of 4 x 10-7 per DNA replication? (Provide an example of the calculation with units. Hint: cell and generation cancel out)
    • 4 x 10-7 mutations/generation -(4 x 10-7-)4 = 2.56 x 10-26 mutations/generation
    • 365/16 = 22.5 turnovers/yr -(3.1 x 1028 cells) x 22.5 = 8.2 x 1029 cells/yr -(8.2 x 1029 cells/yr)(2.56 x 10-26 mutations/generation) = 2.1 x 104 mutations/yr
  • Given the large population size and high mutation rate of prokaryotic cells, what are the implications with respect to genetic diversity and adaptive potential? Are point mutations the only way in which microbial genomes diversify and adapt?

Prokaryotes would have high genetic diversity and the ability to adapt quickly dude to their high mutation rate. Insertions and deletions are generally detrimental to a gene’s function since they shift the reading frame so point mutations tend to be the most common, but there’s potential for these type of mutations to promote genetic diversity.

  • What relationships can be inferred between prokaryotic abundance, diversity, and metabolic potential based on the information provided in the text?

High prokaryotic abundance encourages the diversification of metabolic capabilities in prokaryotes. There are more likely to be more mutations taking place in a larger population of prokaryotes that allow them to fully take advantage of their environment and compete for different resources.

Problem set_02 “Microbial Engines”

Learning objectives:

Discuss the role of microbial diversity and formation of coupled metabolism in driving global biogeochemical cycles.

Specific Questions:

  • What are the primary geophysical and biogeochemical processes that create and sustain conditions for life on Earth? How do abiotic versus biotic processes vary with respect to matter and energy transformation and how are they interconnected?
    • The primary geophysical processs is tectonics and the atmospheric protochemical process is the biochemical process that create geochemical cycles.
    • abiotic processes are based on acid/base chemistry
    • biotic processes are dependent on redox reactions
    • abiotic processes are a source of nutrients for biotic reactions
  • Why is Earth’s redox state considered an emergent property?
    • biogechemical cycles of microbial life evolved to form nested abiotically driven acid-base redox reactions
    • these reactions over time altered the redox state of the planet
    • result of a collective complex system made up of individual microbes
  • How do reversible electron transfer reactions give rise to element and nutrient cycles at different ecological scales? What strategies do microbes use to overcome thermodynamic barriers to reversible electron flow?
    • the reduction or oxidation of elements allows them to assimilate or dissmilate nutrients, leading to the recycling of nutrient cycles
    • overcome thermodynamic barriers with synergistic cooperation of multispeicies assemblages
  • Using information provided in the text, describe how the nitrogen cycle partitions between different redox “niches” and microbial groups. Is there a relationship between the nitrogen cycle and climate change?
    • NH4+ is oxidized in a two-step manner, first requiring a group of Bacteria or Archaea to oxidize ammonia to NO2-, then oxidized to NO3+-+ by different suite of nitrifying bacteria
  • third set of microbes uses NO2+-+ and NO3+-+ as electron acceptors to form N2
  • incomplete reduction of nirate or nitrite due to excess nitrogen being introduced to the nitrogen cycle leads to accumulation of nitrous oxide
  • nitrous oxide is a potent greenhosue gass that contributes to global warming

  • What is the relationship between microbial diversity and metabolic diversity and how does this relate to the discovery of new protein families from microbial community genomes?
    • linear relationship between number of nonreduandant (diverse) microbiaal seuqneces and the discovery of new protein families
    • but metabolic machinery is highly conserved between microbes, so microbial diversity does not lead to metabolic diversity
  • On what basis do the authors consider microbes the guardians of metabolism?
    • microbes have core planetary gene set dispersed to them through vertical or horizontal gene transfer allow them to protect the metabolic pathway

Module 1 Essay

The microbiology community’s general consensus is that humans would not be able to live without microbes. Falkowski et al.(1) commented that “Microbial life can easily live without us; we, however, cannot survive without the global catalysis and environmental transformations it provides.”. It may be bold to assume they are necessary for our survival, but their existence is essential to our current lifestyle. Microbial networks facilitate the biogeochemical processes that cycle our nutrients and maintain a livable atmosphere. They are difficult to replicate because of their complexity and scale, and efforts to emulate them have resulted in environmental damage. Furthermore, their resilience makes them valuable assets in our fight against climate change.

Microbes form metabolic networks that facilitate the biogeochemical processes which fix and cycle our nutrients. Carbon and nitrogen are necessary for the production of biological building blocks that make up our body (2), but they cannot be utilized as our nutrients unless they are either converted from its inorganic form or reduced. Nitrogen can only be incorporated into biological molecules through nitrogen fixation, where nitrogen gas (N2) is reduced to ammonium. Microbes are the only organisms that can accomplish this biotically, since their genes encode the enzyme nitrogenase—a heterodimeric complex that breaks apart the N≡N bond of N2 (1). Similarly, microbes are necessary for the movement of carbon between sinks. There are three times as many global organic carbon stocks stored in soil as the amount of inorganic carbon stored in the atmosphere as CO2 (3). If microbial respiration were to cease, current primary production would deplete atmospheric CO2 stocks in 12 years (4) and dramatically decrease the rate of photosynthesis in our crops.

We currently do not have the technological capacity to replace these metabolic networks due to their complexity and scale. Metabolic networks consist of individual redox reactions that are carried out by different macromolecular complexes that are encoded by many genes or housed in different microbial groups. In oxygenic photosynthesis, 100 genes alone are needed to encode the molecular complexes required for energy transduction (6). To further complicate matters, some pathways in biogeochemical cycles are catalyzed by diverse multispecies microbial interactions. In the nitrogen cycle, NH4+ is first oxidized to NO2- by a group of Bacteria or Archaea then a different group of nitrifying oxidizing bacteria oxidizes NO2- to NO3- (7). The scale of these reactions is another challenging aspect we would need to overcome. There are approximately 4-6 x 1030 prokaryotes on earth in total (8) and these numbers do not include eukaryotic microorganisms. The sheer abundance of these microorganisms demonstrates that these microbial metabolic networks exist at a large scale that we may never be able to reconstruct entirely.

Our attempts to emulate some of these metabolic networks have been damaging for the environment and further highlights our limitations. Humans have acquired the ability to fix nitrogen inorganically through fossil fuel combustion, almost doubling the rate of terrestrial nitrogen fixation. The excess NH4+ produced industrially is converted to NO3- , which leaches into water reserves and creates anoxic zones. This lead to a rise in atmospheric N2O—a greenhouse gas that has 300 times global warming potential of CO2. These environmental damages are a testament of our inability to construct an elegant biochemical network like microbes. Until we can balance the inputs of our activities with an output that does not alter the climate, we will need to rely on the adaptive capabilities of microbes to produce a new steady state for the biosphere.

Microbes are invaluable allies in our efforts to combat climate change and our foray into the Anthropocene Era because of their resilience to environmental changes. We have disturbed major Earth-system processes through our interference with the nitrogen cycle and climate change, disturbing the very environmental conditions that enabled our development. To salvage the damage, we would require the help of microbes. They can adapt to environmental changes quickly because their large numbers and rapid growth gives them the capacity create genetically diverse groups—granting them the ability to form new metabolic networks. The formation of these new networks can create a new steady state where excess nitrogen or carbon dioxide is removed from the system at the same rate it is added (8). Indeed, up until the Industrial Revolution, the evolution and basic composition of Earth’s atmosphere was tightly linked to the evolution of their metabolic networks (5). Cyanobacteria, which are oxygen producers as well as major nitrogen fixers, have had to evolve complex mechanisms to protect their oxygen sensitive nitrogenase. Taken together, microbes’ ability to resist environmental changes through evolutionary processes makes them indispensable allies in the fight against human-driven climate change.

In conclusion, microbes are necessary because of the metabolic networks they form. These networks facilitate biogeochemical process that are critical to our current lifestyle. Moreover, our attempts to mimic these processes have significantly damaged the environment and spurred climate change. The resilience of these metabolic networks to our activities will be instrumental as we enter the Anthropocene Era, but our perturbation of microbial-driven biogeochemical processes could lead to irreversible changes unless we practice restraint.

References

  1. Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive earth’s biogeochemical cycles. Science (80- ) 320:1034–1039.
  2. Schlesinger WH. 1997. Biogeochemistry: an analysis of global change – 2nd ed. Acad Press San Diego 139–143.
  3. Falkowski P, Scholes RJ, Boyle E, Canadell J, Canfield D, Elser J, Gruber N, Hibbard K, Hogberg P, Linder S, Mackenzie FT, Moore B, Pedersen T, Rosental Y, Seitzinger S, Smetacek V, Steffen W. 2000. The global carbon cycle: A test of our knowledge of earth as a system. Science (80- ) 290:291–296.
  4. Sylvia DM, Fuhrmann JJ, Hartel PG, Zuberer DA, Cupples AM. 2005. Principles and applications of soil microbiologyJournal of Environment Quality.
  5. Kasting JF, Siefert JL. 2002. Life and the evolution of earth â€TM s atmosphere. Library (Lond) 296:1066–1069.
  6. Shi T, Bibby TS, Jiang L, Irwin AJ, Falkowski PG. 2005. Protein interactions limit the rate of evolution of photosynthetic genes in cyanobacteria. Mol Biol Evol 22:2179–2189.
  7. Falkowski PG. 1997. Evolution of the nitrogen cycle and its influence on the biological sequestration of CO2 in the ocean. Nature 387:272–275.
  8. Whitman WB, Coleman DC, Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci 95:6578–6583.
  9. Rockström J, Steffen W, Noone K, Persson Å, Chapin FS, Lambin E, Lenton TM, Scheffer M, Folke C, Schellnhuber HJ, Nykvist B, de Wit CA, Hughes T, van der Leeuw S, Rodhe H, Sörlin S, Snyder PK, Costanza R, Svedin U, Falkenmark M, Karlberg L, Corell RW, Fabry VJ, Hansen J, Walker B, Liverman D, Richardson K, Crutzen P, Foley J. 2009. Planetary boundaries: Exploring the safe operating space for humanity. Ecol Soc 14.

Module 01 references

Utilize this space to include a bibliography of any literature you want associated with this module. We recommend keeping this as the final header under each module.

An example for Whitman and Wiebe (1998) has been included below.

  1. Achenbach J. 2012. Spaceship Earth: a new view of environmentalism. Washington Post 2–5. (https://www.washingtonpost.com/national/health-science/spaceship-earth-a-new-view-of-environmentalism/2011/12/29/gIQAZhH6WP_story.html)
  2. Canfield DE, Glazer AN, Falkowski PG. 2010. The evolution and future of earth’s nitrogen cycle. Science (80- ) 330:192–196. PMID20929768
  3. Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive earth’s biogeochemical cycles. Science (80- ) 320:1034–1039. PMID18497287
  4. Kasting JF, Siefert JL. 2002. Life and the evolution of earth’s atmosphere. Library (Lond) 296:1066–1069. PMID12004117
  5. Leopold AC, Pickett STA, Ostfeld RS, Shachak M, Likens GE. 1949. The land ethic of Aldo Leopold. J For 5–7.
  6. Nisbet EG, Sleep NH. 2001. The habitat and nature of early life. Nature 409:1083–1091. PMID11234022
  7. Falkowski P, Scholes RJ, Boyle E, Canadell J, Canfield D, Elser J, Gruber N, Hibbard K, Hogberg P, Linder S, Mackenzie FT, Moore B, Pedersen T, Rosental Y, Seitzinger S, Smetacek V, Steffen W. 2000. The global carbon cycle: A test of our knowledge of earth as a system. Science (80- ) 290:291–296. PMID11030643
  8. Kallmeyer J, Pockalny R, Adhikari RR, Smith DC, D’Hondt S. 2012. Global distribution of microbial abundance and biomass in subseafloor sediment. Proc Natl Acad Sci 109:16213–16216. PMC3479597
  9. Rockström J, Steffen W, Noone K, Persson Å, Chapin FS, Lambin E, Lenton TM, Scheffer M, Folke C, Schellnhuber HJ, Nykvist B, de Wit CA, Hughes T, van der Leeuw S, Rodhe H, Sörlin S, Snyder PK, Costanza R, Svedin U, Falkenmark M, Karlberg L, Corell RW, Fabry VJ, Hansen J, Walker B, Liverman D, Richardson K, Crutzen P, Foley J. 2009. Planetary boundaries: Exploring the safe operating space for humanity. Ecol Soc 14. PMID19779433
  10. Shrag, Geobiology of Anthropocene_2012.pdf.
  11. Breitburg D, Levin LA, Oschlies A, Grégoire M, Chavez FP, Conley DJ, Garçon V, Gilbert D, Gutiérrez D, Isensee K, Jacinto GS, Limburg KE, Montes I, Naqvi SWA, Pitcher GC, Rabalais NN, Roman MR, Rose KA, Seibel BA, Telszewski M, Yasuhara M, Zhang J. 2018.Declining oxygen in the global ocean and coastal waters. Science (80- ) 359. PMID29301986
  12. Waters CN, Zalasiewicz J, Summerhayes C, Barnosky AD, Poirier C, Gałuszka A, Cearreta A, Edgeworth M, Ellis EC, Ellis M, Jeandel C, Leinfelder R, McNeill JR, Richter DDB, Steffen W, Syvitski J, Vidas D, Wagreich M, Williams M, Zhisheng A, Grinevald J, Odada E, Oreskes N, Wolfe AP. 2016. The Anthropocene is functionally and stratigraphically distinct from the Holocene. Science (80- ) 351. PMID26744408
  13. Whitman WB, Coleman DC, Wiebe WJ. 1998. Prokaryotes: The unseen majority. Proc Natl Acad Sci 95:6578–6583. PMC33863

Module 2

Module 02 Portfolio Content

  • Evidence worksheet_04
    • Completion status: X
    • Comments:
  • Problem Set_03
    • Completion status: X
    • Comments:
  • Writing assessment_02
    • CANCELED
  • Additional Readings
    • Completion status:
    • Comments

Evidence worksheet 04

Martinez et al 2007

Learning objectives

Discuss the relationship between microbial community structure and metabolic diversity. Evaluate common methods for studying the diversity of microbial communities. Recognize basic design elements in metagenomic workflows.

General questions

  • What were the main questions being asked?
    • Can light driven ATP synthesis be transferred to a heterologous bacterium in a single genetic event?
    • “To further characterize PR photosystem structure and function.”
  • What were the primary methodological approaches used?
    • Screen for PR-containing clones on retinal-containing LB agar plating medium, by looking for red or orange pigmentation.
    • They found two clones they suspected to have the PR photosystem.
    • The full DNA sequence of the two putative PR photosystem containing fosmids was obtained by sequencing a collection of transposon-insertion clones
    • They analyzed different transposon insertion mutants and looked at the accumulations of intermediates to deduce functions. ??? This was done using cell pigmentation and HPLC pigment analyses -They measured pH to determine whether the fosmids independently expressed a functional PR with light-activated proton-translocating activity.
  • Summarize the main results or findings.
    Two fosmids were identified that contained the genes that are necessary and sufficient for proteorhodopsin based phototrophy. These were cloned into E. coli cells and both exterior pH and interior ATP concentration were shown to change when the e. Coli cells were exposed to light. Further, they showed that these fosmids contained genes sufficient to produce retinol (PR cofactor) as long as the cells already produced the intermediate FPP, which e. coli and many other bacteria do. Copy number of the genes showed a difference in phenotypic identification. The clones also had high similarity to other PR-containing BAC clones from Alphaproteobacteria from the Mediterranean and Red Seas.

  • Do new questions arise from the results?
    • How much variation in this grouping of genes is there naturally?
    • Are retinal pathways usually close to the PR gene? Are they transferred together as a rule?
    • Are these genes (retinol pathway+PR) usually located on a plasmid or integrated into the bacterial genome in natural communities?
    • Expect to see this gene set distributed across a lot of phyla?
  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
    • What fosmids are and how they are constructed.

Problem set_03 “Metagenomics: Genomic Analysis of Microbial Communities”

Learning objectives:

Specific emphasis should be placed on the process used to find the answer. Be as comprehensive as possible e.g. provide URLs for web sources, literature citations, etc.
(Reminders for how to format links, etc in RMarkdown are in the RMarkdown Cheat Sheets)

Specific Questions:

  • How many prokaryotic divisions have been described and how many have no cultured representatives (microbial dark matter)?
    • 60 major of lines of descent
    • half have no culturered representatives
  • 2016
    • 89 bacterial phyla
    • 20 archael phyla
    • via small 16 s rRNA databases

https://www.nature.com/articles/nature12352

  • How many metagenome sequencing projects are currently available in the public domain and what types of environments are they sourced from?
    • EBI has a total of 1469 projects
    • sourced from biomes such as engineered wastewater, freshwater, host-associated (human or nonhuman), marine, soil (forest)
    • https://www.ebi.ac.uk/metagenomics/projects
    • look up IMG-M projects
  • What types of on-line resources are available for warehousing and/or analyzing environmental sequence information (provide names, URLS and applications)?
  • What is the difference between phylogenetic and functional gene anchors and how can they be used in metagenome analysis?
    • phylogenetic anchors: use slowly evolving marker genes (16S rRNA, 18S rRNA) to predict taxonomic origins of enviornmental genomic fragments (very conserved)
    • vertical transmission
    • typically single-copy, don’t need to code for functioning protein
    • https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2367736/
    • functional gene anchors: genes linked to directly biogeochemical function
    • evole quicker, single-copy
    • ex. RUBISCO
  • What is metagenomic sequence binning? What types of algorithmic approaches are used to produce sequence bins? What are some risks and opportunities associated with using sequence bins for metabolic reconstruction of uncultivated microorganisms?
    • placing sequence with its correct OTU
    • composition-based binning: GC content of bacterial genome, Markov models based on k-mer frequencies
    • closer OTUs have higher frequency of being misclassified
    • similarity-based binning: find similarities to reference sequences
    • samples should have significant similarities to reference
  • Is there an alternative to metagenomic shotgun sequencing that can be used to access the metabolic potential of uncultivated microorganisms? What are some risks and opportunities associated with this alternative?
    • functional screens
    • 3rd generation single-cell sequencing
    • FISH: probe for specific sequences
    • nanpore: single molecule sequencing

Additional Readings

  1. Madsen EL. 2005. Madsen microbes eco biogeochem process Nature Micro Opinion 3. PMID15864265

  2. Martinez A, Bradley AS, Waldbauer JR, Summons RE, DeLong EF. 2007. Proteorhodopsin photosystem gene expression enables photophosphorylation in a heterologous host. Proc Natl Acad Sci 104:5590–5595. PMID17372221

  3. Taupp M, Mewis K, Hallam SJ. 2011. The art and design of functional metagenomic screens. Curr Opin Biotechnol 22:465–472. PMID21440432

  4. Wooley JC, Godzik A, Friedberg I. 2010. A primer on metagenomics. PLoS Comput Biol 6. PMID20195499

Module 3

Module 03 Portfolio Content

  • Evidence worksheet_05
    • Completion status:
    • Comments:
  • Problem set_04
    • Completion status:
    • Comments:
  • Writing Assessment_03
    • Completion status:
    • Comments:
  • Additional Readings
    • Completion status:
    • Comments

Project 1

  • CATME account setup and survey
    • Completion status: X
    • Comments:
  • CATME interim group assessment
    • Completion status: X
    • Comments:
  • Project 1
    • Report (80%):
    • Participation (20%):

Evidence worksheet 05

Welch et al 2002

Learning objectives

• Evaluate the concept of microbial species based on environmental surveys and cultivation studies.

• Explain the relationship between microdiversity, genomic diversity and metabolic potential

• Comment on the forces mediating divergence and cohesion in natural microbial communities

• Comment on the creative tension between gene loss, duplication and acquisition as it relates to microbial genome evolution

• Identify common molecular signatures used to infer genomic identity and cohesion

• Differentiate between mobile elements and different modes of gene transfer

General questions

  • What were the main questions being asked?
  • The difference and similarity of the genomes of CFT073, enterohemorrhagic EDL933, and a nonpathogenic laboratory strain MG1655. How do they compare to each other? What makes them distinct from one another?

  • What were the primary methodological approaches used?
  • Whole-genome libraries prepared from genomic DNA.
  • Sequence analysis and annotation

  • Summarize the main results or findings.
  • codon usage analysis says that set of backbone E.coli genes that have a shared codon bias which creates a framework
  • similar virulence genes come into play but linkage relationships and chromosomal locations vary
  • islands tend to have adaptive traits

  • Do new questions arise from the results?
  • Why aren’t virulence plasmids associated with uropathogenic strains even though they are common to many E. coli isolates?
  • Should we define a species based on phenotypic traits?
  • How to assess deletions that remove genes detrimental to uropathogenic lifestyle given the large number of genetic differences?

  • Were there any specific challenges or advantages in understanding the paper (e.g. did the authors provide sufficient background information to understand experimental logic, were methods explained adequately, were any specific assumptions made, were conclusions justified based on the evidence, were the figures or tables useful and easy to understand)?
  • conclusions were sufficiently justified based on the evidence

  • Based on your reading and discussion notes, explain the meaning and content of the following figure derived from the comparative genomic analysis of three E. coli genomes by Welch et al. Remember that CFT073 is a uropathogenic strain and that EDL933 is an enterohemorrhagic strain. Explain how this study relates to your understanding of ecotype diversity. Provide a definition of ecotype in the context of the human body. Explain why certain subsets of genes in CFT073 provide adaptive traits under your ecological model and speculate on their mode of vertical descent or gene transfer.
  • Different ecosystems have different environmental conditions that apply evolutionary pressure to a species, so it diverges into strains.
  • Ecotype definition is equivalent to strains. So in this context there are two different strains: uropathogenic and enterohemorrhagic.
  • I think pathogenic traits that are encoded in islands are transferred horizontal gene transfer, whereas the ancestral backbone genes (i.e. what groups them as a species) is vertically acquired.

Problem set_04 “Fine-scale phylogenetic architecture”

Learning objectives:

  • Gain experience estimating diversity within a hypothetical microbial community

Outline:

In class Day 1:

  1. Define and describe species within your group’s “microbial” community.
  2. Count and record individuals within your defined species groups.
  3. Remix all species together to reform the original community.
  4. Each person in your group takes a random sample of the community (i.e. devide up the candy).

Assignment:

  1. Individually, complete a collection curve for your sample.
  2. Calculate alpha-diversity based on your original total community and your individual sample.

In class Day 2:

  1. Compare diversity between groups.

Part 1: Description and enumeration

Obtain a collection of “microbial” cells from “seawater”. The cells were concentrated from different depth intervals by a marine microbiologist travelling along the Line-P transect in the northeast subarctic Pacific Ocean off the coast of Vancouver Island British Columbia.

Sort out and identify different microbial “species” based on shared properties or traits. Record your data in this Rmarkdown using the example data as a guide.

Once you have defined your binning criteria, separate the cells using the sampling bags provided. These operational taxonomic units (OTUs) will be considered separate “species”. This problem set is based on content available at What is Biodiversity.

For example, load in the packages you will use.

#To make tables
library(kableExtra)
library(knitr)
#To manipulate and plot data
library(tidyverse)

Then load in the data. You should use a similar format to record your community data.

seawater_data = read.csv("candy-data.csv",
                           header = TRUE)
seawater_data_small = read.csv("candy-data_small.csv",
                               header = TRUE)

Finally, use these data to create a table.

seawater_data %>% 
  kable("html") %>%
  kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
X name occurances
1 m&m green 28
2 m&m red 28
3 m&m blue 60
4 m&m yellow 44
5 m&m brown 30
6 m&m orange 63
7 skittle brown 39
8 skittle red 33
9 skittle green 42
10 skittle orange 35
11 skittle yellow 23
12 gummi bear red 15
13 gummi bear pink 16
14 gummi bear green 18
15 gummi bear orange 15
16 gummi bear yellow 19
17 gummi bear white 16
18 m&i pink 39
19 m & i green 36
20 m&i yellow 27
21 m&i orange 32
22 m&ired 40
23 worms red 14
24 balls yellow 4
25 balls green 5
26 balls purple 3
27 balls orange 5
28 balls red 7
29 chocolate kiss 16
30 lego pink 7
31 lego yellow 5
32 lego blue 4
NA 768
seawater_data_small %>%
  kable("html") %>%
  kable_styling(bootstrap_options = "striped", font_size = 10, full_width = F)
X name characteristics occurances
1 green ball NA 3
2 gummi bear green NA 3
3 gummi bear red NA 2
4 gummi bear yellow NA 3
5 gummi clear NA 1
6 gummi orange NA 3
7 m & i green NA 4
8 m & i orange NA 1
9 m & i pink NA 5
10 m & i red NA 10
11 m & i yellow NA 7
12 m & m blue NA 12
13 m & m brown NA 4
14 m & m green NA 4
15 m & m orange NA 11
16 m & m yellow NA 5
17 m & m red NA 2
18 pink brick NA 1
19 purple ball NA 2
20 red lines NA 4
21 skittle brown NA 2
22 skittle green NA 10
23 skittle orange NA 7
24 skittle purple NA 5
25 skittle red NA 4
26 skittle yellow NA 6
NA NA 121

For your community:

  • Construct a table listing each species, its distinguishing characteristics, the name you have given it, and the number of occurrences of the species in the collection.
  • Ask yourself if your collection of microbial cells from seawater represents the actual diversity of microorganisms inhabiting waters along the Line-P transect. Were the majority of different species sampled or were many missed?

Part 2: Collector’s curve

To help answer the questions raised in Part 1, you will conduct a simple but informative analysis that is a standard practice in biodiversity surveys. This analysis involves constructing a collector’s curve that plots the cumulative number of species observed along the y-axis and the cumulative number of individuals classified along the x-axis. This curve is an increasing function with a slope that will decrease as more individuals are classified and as fewer species remain to be identified. If sampling stops while the curve is still rapidly increasing then this indicates that sampling is incomplete and many species remain undetected. Alternatively, if the slope of the curve reaches zero (flattens out), sampling is likely more than adequate.

To construct the curve for your samples, choose a cell within the collection at random. This will be your first data point, such that X = 1 and Y = 1. Next, move consistently in any direction to a new cell and record whether it is different from the first. In this step X = 2, but Y may remain 1 or change to 2 if the individual represents a new species. Repeat this process until you have proceeded through all cells in your collection.

For example, we load in these data.

collection_curve = read.csv("collection_curve.csv", 
                            header = FALSE)

And then create a plot. We will use a scatterplot (geom_point) to plot the raw data and then add a smoother to see the overall trend of the data.

ggplot(collection_curve, aes(x=V1, y=V2)) +
  geom_point() +
  geom_smooth() +
  labs(x="Cumulative number of individuals classified", y="Cumulative number of species observed")
## `geom_smooth()` using method = 'loess'

For your sample:

  • Create a collector’s curve for your sample (not the entire original community).
  • Does the curve flatten out? If so, after how many individual cells have been collected?
  • What can you conclude from the shape of your collector’s curve as to your depth of sampling?

Part 3: Diversity estimates (alpha diversity)

Using the table from Part 1, calculate species diversity using the following indices or metrics.

Diversity: Simpson Reciprocal Index

\(\frac{1}{D}\) where \(D = \sum p_i^2\)

\(p_i\) = the fractional abundance of the \(i^{th}\) species

For example, using the example data 1 with 3 species with 2, 4, and 1 individuals each, D =

species1 = 28/(768)
species2 = 28/(768)
species3 = 60/(768)
species4 = 44/(768)
species5 = 30/768
species6 = 63/768
species7 = 39/768
species8 = 33/768
species9 = 42/768
species10 = 35/768
species11 = 23/768
species12 = 15/768
species13 = 16/768
species14 = 18/768
species15 = 15/768
species16 = 19/768
species17 = 16/768
species18 = 39/768
species19 = 36/768
species20 = 27/768
species21 = 32/768
species22 = 40/768
species23 = 14/768
species24 = 4/768
species25 = 5/768
species26 = 3/768
species27 = 5/768
species28 = 7/768
species29 = 16/768
species30 = 7/768
species31 = 5/768
species32 = 4/768
1 / (species1^2 + species2^2 + species3^2 + species4^2 + species5^2 + species6^2 + species7^2 + species8^2 + species9^2 + species10^2 + species11^2 + species12^2 + species13^2 + species14^2 + species15^2 + species16^2 + species17^2 + species18^2 + species19^2 + species20^2 + species21^2 + species22^2 + species23^2 + species24^2 + species25^2 + species26^2 + species27^2 + species28^2 + species29^2 + species30^2 + species31^2 + species32^2)
## [1] 22.18718
Species1 = 3/121
Species2 = 3/121
Species3 = 2/121
Species4 = 3/121
Species5 = 1/121
Species6 = 4/121
Species7 = 1/121
Species8 = 5/121
Species9 = 10/121
Species10 = 7/121
Species11 = 12/121
Species12 = 4/121
Species13 = 4/121
Species14 = 11/121
Species15 = 5/121
Species16 = 2/121
Species17 = 1/121
Species18 = 2/121
Species19 = 4/121
Species20 = 2/121
Species21 = 10/121
Species22 = 7/121
Species23 = 5/121
Species24 = 4/121
Species25 = 6/121
Species26 = 3/121

1/ (Species1^2 + Species2^2 + Species3^2 + Species4^2 + Species5^2 + Species6^2 + Species7^2 + Species8^2 + Species9^2 + Species10^2 + Species11^2 + Species12^2 + Species13^2 + Species14^2 + Species15^2 + Species16^2 + Species17^2 + Species18^2 + Species19^2 + Species20^2 + Species21^2 + Species22^2 + Species23^2 + Species24^2 + Species25^2 + Species26^2 )
## [1] 18.09765

The higher the value is, the greater the diversity. The maximum value is the number of species in the sample, which occurs when all species contain an equal number of individuals. Because the index reflects the number of species present (richness) and the relative proportions of each species with a community (evenness), this metric is a diveristy metric. Consider that a community can have the same number of species (equal richness) but manifest a skewed distribution in the proportion of each species (unequal evenness), which would result in different diveristy values.

  • What is the Simpson Reciprocal Index for your sample?
  • 18.09
  • What is the Simpson Reciprocal Index for your original total community?
  • 22.19
Richness: Chao1 richness estimator

Another way to calculate diversity is to estimate the number of species that are present in a sample based on the empirical data to give an upper boundary of the richness of a sample. Here, we use the Chao1 richness estimator.

\(S_{chao1} = S_{obs} + \frac{a^2}{2b})\)

\(S_{obs}\) = total number of species observed a = species observed once b = species observed twice or more

So for our previous example community of 3 species with 2, 4, and 1 individuals each, \(S_{chao1}\) =

26 + 0/(26*2)
## [1] 26
32 + 0/(32*2)
## [1] 32
  • What is the chao1 estimate for your sample?
  • 26
  • What is the chao1 estimate for your original total community?
  • 32

Part 4: Alpha-diversity functions in R

We’ve been doing the above calculations by hand, which is a very good exercise to aid in understanding the math behind these estimates. Not surprisingly, these same calculations can be done with R functions. Since we just have a species table, we will use the vegan package. You will need to install this package if you have not done so previously.

library(vegan)

First, we must remove the unnecesary data columns and transpose the data so that vegan reads it as a species table with species as columns and rows as samples (of which you only have 1).

data_diversity = 
  seawater_data %>% 
  select(name, occurances) %>% 
  spread(name, occurances)

data_diversity
##    V1 balls green balls orange balls purple balls red balls yellow
## 1 768           5            5            3         7            4
##   chocolate kiss gummi bear green gummi bear orange gummi bear pink
## 1             16               18                15              16
##   gummi bear red gummi bear white gummi bear yellow lego blue lego pink
## 1             15               16                19         4         7
##   lego yellow m & i green m&i orange m&i pink m&i yellow m&ired m&m blue
## 1           5          36         32       39         27     40       60
##   m&m brown m&m green m&m orange m&m red m&m yellow skittle brown
## 1        30        28         63      28         44            39
##   skittle green skittle orange skittle red skittle yellow worms red
## 1            42             35          33             23        14
small_data_diversity = 
  seawater_data_small %>% 
  select(name, occurances) %>% 
  spread(name, occurances)

small_data_diversity
##    V1 green ball gummi bear green gummi bear red gummi bear yellow
## 1 121          3                3              2                 3
##   gummi clear  gummi orange m & i green m & i orange m & i pink m & i red
## 1            1            3           4            1          5        10
##   m & i yellow m & m blue m & m brown m & m green m & m orange m & m red
## 1            7         12           4           4           11         2
##   m & m yellow  pink brick purple ball red lines skittle brown 
## 1             5          1           2         4              2
##   skittle green skittle orange skittle purple skittle red skittle yellow
## 1            10              7              5           4              6

Then we can calculate the Simpson Reciprocal Index using the diversity function.

diversity(data_diversity, index="invsimpson")
## [1] 3.827491
diversity(small_data_diversity, index = "invsimpson")
## [1] 3.79055

And we can calculate the Chao1 richness estimator (and others by default) with the the specpool function for extrapolated species richness. This function rounds to the nearest whole number so the value will be slightly different that what you’ve calculated above.

specpool(data_diversity)
##     Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All      33   33       0    33        0    33   33       0 1
specpool(small_data_diversity)
##     Species chao chao.se jack1 jack1.se jack2 boot boot.se n
## All      27   27       0    27        0    27   27       0 1

In Project 1, you will also see functions for calculating alpha-diversity in the phyloseq package since we will be working with data in that form.

For your sample:

  • What are the Simpson Reciprocal Indices for your sample and community using the R function?
  • Community: 22.18
  • Sample: 18.1
  • What are the chao1 estimates for your sample and community using the R function?
  • Community: 32
  • Sample: 26
    • Verify that these values match your previous calculations.

Part 5: Concluding activity

If you are stuck on some of these final questions, reading the Kunin et al. 2010 and Lundin et al. 2012 papers may provide helpful insights.

  • How does the measure of diversity depend on the definition of species in your samples? The more specific the definition of speices for our samples, the more diverse our samples appear to be. Whereas if we had a broader definition of what constitutes as a speices (e.g.grouping candy only by colour), then our diversity would lower.

  • Can you think of alternative ways to cluster or bin your data that might change the observed number of species? As mentioned, grouping the candy by only their shapes or colour would change the observed number of species.
  • How might different sequencing technologies influence observed diversity in a sample? Some sequencing methods might overestimate the diversity and the number of species in a sample.

Module 3 Essay

What defines a microbial species is still an on-going debate in the research community. The challenge in defining a microbial species can be attributed to several factors. Unlike eukaryotes, prokaryotes are usually haploid organisms that reproduce asexually. They also cannot be easily distinguished on phenotypic traits alone. To circumvent this, researchers have attempted to supplement phenotypic approaches with genotypic ones, where organisms are considered to be the same species if 70% of their DNA hybridizes (1). However, this definition is complicated by horizontal gene transfer (HGT) events, where the uptake of genetic material from the environment can cause different microbial species to have increased homology as well as functional traits.

Functional metabolic genes are more likely to undergo HGT, leading to an increased reliance on the highly conserved 16S rRNA gene as sequencing techniques and metagenomic approaches becomes more advanced. The gene is a slowly-evolving ‘phylogenetic’ anchor that is not only useful for species identification purposes and establishing evolutionary relationships (2). Two microbes are generally considered to be the same species if their 16S rRNA has a sequence similarity of 97% or higher (3), but this approach of species identification is not faultless. Indeed, overestimation of diversity from an environmental sample can happen as a result of poor quality filtering of 16s rRNA pyrosequencing data (4). It was also shown that two organisms with the same genus can have 99% 16s rRNA gene homology but still be two difference species (3). These issues indicate that a species classification approach that purely relies on 16S rRNA is potentially problematic and unreliable.

Perhaps our increased dependence on 16S rRNA in defining the concept of bacterial species is out of simplicity and convenience. Evolutionary pressures along with the transfer of entire metabolic pathways by HGT (5) permits the creation of microbial species and ecotypes, members of the same species that have evolved and adapted to a specific environment. The three pathogenic ecotypes of E.coli, GT073, EDL933, and MG1655 occupy different niches of the body (6). Intriguingly, despite sequence homology experiments indicating that they only share 39% of their genomic sequence (6), they would be classified as the same species based on their 16S rRNA sequences and genomic backbone. The discrepancies between their genome lies in the genes they have acquired through HGT that encode the pathological traits needed to occupy their specific niches. These strains of E. coli highlights how divergent events resulting from HGT relative to 16S rRNA marker genes could lead to an erosion of the microbial species definition.

While HGT blurs and complicates our attempts to define microbial species by distributing different metabolic pathways among members of the same species, the same mechanism has been instrumental in preserving the existence of certain metabolic pathways. Diversity of metabolic pathways is preserved over time as HGT distributes metabolic traits across different lineages and environments. One such example are the genes that encode the Nitrogenase enzyme, which are evolutionary favorable and detected in many lineages of microorganisms because it allows them to use inorganic nitrogen for anabolism (7). Functional gene sets such as the Nitrogenase genes are necessary for keeping the flow of nutrients on Earth flowing and by extension—the maintenance of biogeochemical cycles. Most of them likely originated from a large scale genetic innovation that occurred around 2.5 billion years ago during the Archean period (8). HGT played a key role in ensuring the survival of these functional genes after that event and for persevering it from being lost due to gene duplication and mass extinction events by distributing them across different ecological niches. Thus, HGT events directly influenced the state of biogeochemical cycles through the preservation of key metabolic pathways.

To summarize, there are two main approaches to defining microbial species, either through a pure genotypic approach or a functional approach. A genotypic approach, such as the extent to which the genomic DNA hybridizes together or how similar the 16S rRNA sequences are, is straightforward but grossly oversimplifies the bacterial species concept. Ecotypes are evident of this, in which organisms can be classified as the same species due to their 16S rRNA despite occupying different niches and having huge functional discrepancies due to HGT. On the other hand, we cannot define microbial species through their functional attributes alone either because HGT has also distributed metabolic pathways across microbial species of different lineages. Despite the flaws in these approaches, it is necessary to have a microbial species definition in some instances. This is especially apparent in a medical setting, where physicians would not be able to prescribe treatments or diagnose diseases caused by pathogenic microbes. For practicality’s sake, it is probably best to combine two approaches and adjust the definitions accordingly to the setting. It might be beneficial to have a “looser” definition in research—where bacterial species are grouped by genomic similarity as a starting point—and keep the definition fluid until we can come to a consensus.

References

  1. Cho JC, Tiedje JM. 2001. Bacterial species determination from DNA-DNA hybridization by using genome fragments and DNA microarrays. Appl Environ Microbiol 67:3677–82.

  2. Krause L, Diaz NN, Goesmann A, Kelley S, Nattkemper TW, Rohwer F, Edwards RA, Stoye J. 2008. Phylogenetic classification of short environmental DNA fragments. Nucleic Acids Res 36:2230–2239.

  3. Nguyen NP, Warnow T, Pop M, White B. 2016. A perspective on 16S rRNA operational taxonomic unit clustering using sequence similarity. npj Biofilms Microbiomes.

  4. Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. 2010. Wrinkles in the rare biosphere: Pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118–123.

  5. Falkowski PG, Fenchel T, Delong EF. 2008. The microbial engines that drive earth’s biogeochemical cycles. Science (80- ) 320:1034–1039.

  6. Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou S-R, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HLT, Donnenberg MS, Blattner FR. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A 99:17020–4.

  7. Falkowski PG. 1997. Evolution of the nitrogen cycle and its influence on the biological sequestration of CO2 in the ocean. Nature 387:272–275.

  8. David LA, Alm EJ. 2011. Rapid evolutionary innovation during an Archaean genetic expansion. Nature 469:93–96.

Project 1

htmltools::tags$iframe(title="MICB425 Project 1", src="Proj1.html", height=1000, width=1000)

Additional Readings

  1. Callahan BJ, McMurdie PJ, Holmes SP. 2017. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J 11:2639–2643. PMID28731476

  2. Gaudet AD, Ramer LM, Nakonechny J, Cragg JJ, Ramer MS. 2010. Small-group learning in an upper-level university biology class enhances academic performance and student attitudes toward group work. PLoS One 5. PMID21209910

  3. Hallam SJ, Torres-Beltrán M, Hawley AK. 2017. Monitoring microbial responses to ocean deoxygenation in a model oxygen minimum zone. Sci Data 4:1–3. PMID29087370

  4. Hawley AK, Brewer HM, Norbeck AD, Pa a-Toli L, Hallam SJ. 2014. Metaproteomics reveals differential modes of metabolic coupling among ubiquitous oxygen minimum zone microbes. Proc Natl Acad Sci 111:11395–11400. PMID25053816

  5. Kunin V, Engelbrektson A, Ochman H, Hugenholtz P. 2010. Wrinkles in the rare biosphere: Pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12:118–123. PMID19725865

  6. Cordero OX, Ventouras L-A, DeLong EF, Polz MF. 2012. Public good dynamics drive evolution of iron acquisition strategies in natural bacterioplankton populations. Proc Natl Acad Sci 109:20059–20064. PMID23169633

  7. Giovannoni SJ. 2012. Vitamins in the sea. Proc Natl Acad Sci 109:13888–13889.
  8. Lundin D, Severin I, Logue JB, Östman Ö, Andersson AF, Lindström ES. 2012. Which sequencing depth is sufficient to describe patterns in bacterial alpha- and beta-diversity? Environ Microbiol Rep 4:367–372. PMID23760801

  9. Morris JJ, Lenski RE, Zinser ER. 2012. The Black Queen Hypothesis: Evolution of Dependencies through Adaptative Gene Loss. MBio 3:1–7. PMID22448042

  10. Thompson JR, Pacocha S, Pharino C, Klepac-Ceraj V, Hunt DE, Benoit J, Sarma-Rupavtarm R, Distel DL, Polz MF. 2005. Genotypic diversity within a natural coastal bacterioplankton population. Science (80- ) 307:1311–1313. PMID15731455

  11. Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou S-R, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HLT, Donnenberg MS, Blattner FR. 2002. Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci U S A 99:17020–4. PMID12471157

  12. Torres-Beltrán M, Hawley AK, Capelle D, Zaikova E, Walsh DA, Mueller A, Scofield M, Payne C, Pakhomova L, Kheirandish S, Finke J, Bhatia M, Shevchuk O, Gies EA, Fairley D, Michiels C, Suttle CA, Whitney F, Crowe SA, Tortell PD, Hallam SJ. 2017. A compendium of geochemical information from the Saanich Inlet water column. Sci Data 4:1–10. PMID29087371

  13. Welch DBM, Huse SM. 2011. Microbial Diversity in the Deep Sea and the Underexplored “Rare Biosphere.” Handb Mol Microb Ecol II Metagenomics Differ Habitats 243–252. PMID16880384

Module 4

Module 04 Portfolio Content

  • CATME final group assessment
    • Completion status:
    • Comments:
  • Project 2
    • Report (80%):
    • Participation (20%):

Project 2

htmltools::tags$iframe(title="MICB425 Project 2", src="Proj2.html", height=1000, width=1000)